AITopics | Visakhapatnam

The emergence of small vision-language models (sVLMs) marks a critical advancement in multimodal AI, enabling efficient processing of visual and textual data in resource-constrained environments. This survey offers a comprehensive exploration of sVLM development, presenting a taxonomy of architectures - transformer-based, mamba-based, and hybrid - that highlight innovations in compact design and computational efficiency. Techniques such as knowledge distillation, lightweight attention mechanisms, and modality pre-fusion are discussed as enablers of high performance with reduced resource requirements. Through an in-depth analysis of models like TinyGPT-V, MiniGPT-4, and VL-Mamba, we identify trade-offs between accuracy, efficiency, and scalability. Persistent challenges, including data biases and generalization to complex tasks, are critically examined, with proposed pathways for addressing them. By consolidating advancements in sVLMs, this work underscores their transformative potential for accessible AI, setting a foundation for future research into efficient multimodal systems.

architecture, dataset, language model, (14 more...)

arXiv.org Artificial Intelligence

2503.10665

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Switzerland (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
(4 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.46)

Industry:

Health & Medicine (0.67)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.88)

Add feedback

Zipfian Whitening

Yokoi, Sho, Bao, Han, Kurita, Hiroto, Shimodaira, Hidetoshi

arXiv.org Machine LearningNov-1-2024

The word embedding space in neural models is skewed, and correcting this can improve task performance. We point out that most approaches for modeling, correcting, and measuring the symmetry of an embedding space implicitly assume that the word frequencies are uniform; in reality, word frequencies follow a highly non-uniform distribution, known as Zipf's law. Surprisingly, simply performing PCA whitening weighted by the empirical word frequency that follows Zipf's law significantly improves task performance, surpassing established baselines. From a theoretical perspective, both our approach and existing methods can be clearly categorized: word representations are distributed according to an exponential family with either uniform or Zipfian base measures. By adopting the latter approach, we can naturally emphasize informative low-frequency words in terms of their vector norm, which becomes evident from the information-geometric perspective, and in terms of the loss functions for imbalanced classification. Additionally, our theory corroborates that popular natural language processing methods, such as skip-gram negative sampling, WhiteningBERT, and headless language models, work well just because their word embeddings encode the empirical word frequency into the underlying probabilistic model.

frequency, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2411.0068

Country:

Asia > Japan > Honshū > Tōhoku (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > Dominican Republic (0.04)
(14 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs

Zhou, Yujun, Yang, Jingdong, Guo, Kehan, Chen, Pin-Yu, Gao, Tian, Geyer, Werner, Moniz, Nuno, Chawla, Nitesh V, Zhang, Xiangliang

arXiv.org Artificial IntelligenceOct-18-2024

Laboratory accidents pose significant risks to human life and property, underscoring the importance of robust safety protocols. Despite advancements in safety training, laboratory personnel may still unknowingly engage in unsafe practices. With the increasing reliance on large language models (LLMs) for guidance in various fields, including laboratory settings, there is a growing concern about their reliability in critical safety-related decision-making. Unlike trained human researchers, LLMs lack formal lab safety education, raising questions about their ability to provide safe and accurate guidance. Existing research on LLM trustworthiness primarily focuses on issues such as ethical compliance, truthfulness, and fairness but fails to fully cover safety-critical real-world applications, like lab safety. To address this gap, we propose the Laboratory Safety Benchmark (LabSafety Bench), a comprehensive evaluation framework based on a new taxonomy aligned with Occupational Safety and Health Administration (OSHA) protocols. This benchmark includes 765 multiple-choice questions verified by human experts, assessing LLMs and vision language models (VLMs) performance in lab safety contexts. Our evaluations demonstrate that while GPT-4o outperforms human participants, it is still prone to critical errors, highlighting the risks of relying on LLMs in safety-critical environments. Our findings emphasize the need for specialized benchmarks to accurately assess the trustworthiness of LLMs in real-world safety applications.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.14182

Country:

North America > United States (1.00)
Asia > India > Andhra Pradesh > Visakhapatnam (0.04)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Education > Educational Setting (1.00)
Government > Regional Government > North America Government > United States Government (0.88)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

EB-NeRD: A Large-Scale Dataset for News Recommendation

Kruse, Johannes, Lindskow, Kasper, Kalloori, Saikishore, Polignano, Marco, Pomo, Claudio, Srivastava, Abhishek, Uppal, Anshuk, Andersen, Michael Riis, Frellsen, Jes

arXiv.org Artificial IntelligenceOct-4-2024

Personalized content recommendations have been pivotal to the content experience in digital media from video streaming to social networks. However, several domain specific challenges have held back adoption of recommender systems in news publishing. To address these challenges, we introduce the Ekstra Bladet News Recommendation Dataset (EB-NeRD). The dataset encompasses data from over a million unique users and more than 37 million impression logs from Ekstra Bladet. It also includes a collection of over 125,000 Danish news articles, complete with titles, abstracts, bodies, and metadata, such as categories. EB-NeRD served as the benchmark dataset for the RecSys '24 Challenge, where it was demonstrated how the dataset can be used to address both technical and normative challenges in designing effective and responsible recommender systems for news publishing. The dataset is available at: https://recsys.eb.dk.

computing machinery, dataset, proceedings, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3687151.3687152

2410.03432

Country:

Europe > Denmark > Capital Region > Kongens Lyngby (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
(22 more...)

Genre:

Research Report (0.50)
Overview (0.46)

Industry:

Media > News (1.00)
Information Technology > Services (0.66)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

RecSys Challenge 2024: Balancing Accuracy and Editorial Values in News Recommendations

Kruse, Johannes, Lindskow, Kasper, Kalloori, Saikishore, Polignano, Marco, Pomo, Claudio, Srivastava, Abhishek, Uppal, Anshuk, Andersen, Michael Riis, Frellsen, Jes

arXiv.org Artificial IntelligenceSep-30-2024

The RecSys Challenge 2024 aims to advance news recommendation by addressing both the technical and normative challenges inherent in designing effective and responsible recommender systems for news publishing. This paper describes the challenge, including its objectives, problem setting, and the dataset provided by the Danish news publishers Ekstra Bladet and JP/Politikens Media Group ("Ekstra Bladet"). The challenge explores the unique aspects of news recommendation, such as modeling user preferences based on behavior, accounting for the influence of the news agenda on user interests, and managing the rapid decay of news items. Additionally, the challenge embraces normative complexities, investigating the effects of recommender systems on news flow and their alignment with editorial values. We summarize the challenge setup, dataset characteristics, and evaluation metrics. Finally, we announce the winners and highlight their contributions. The dataset is available at: https://recsys.eb.dk.

editorial value, recommender system, recsy challenge 2024, (9 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3640457.3687164

2409.20483

Country:

Europe > Denmark > Capital Region > Kongens Lyngby (0.15)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Italy > Apulia > Bari (0.06)
(8 more...)

Genre:

Personal > Honors (0.47)
Research Report (0.40)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multi-Agent Obstacle Avoidance using Velocity Obstacles and Control Barrier Functions

Roncero, Alejandro Sánchez, Muchacho, Rafael I. Cabral, Ögren, Petter

arXiv.org Artificial IntelligenceSep-16-2024

Velocity Obstacles (VO) methods form a paradigm for collision avoidance strategies among moving obstacles and agents. While VO methods perform well in simple multi-agent environments, they don't guarantee safety and can show overly conservative behavior in common situations. In this paper, we propose to combine a VO-strategy for guidance with a CBF-approach for safety, which overcomes the overly conservative behavior of VOs and formally guarantees safety. We validate our method in a baseline comparison study, using 2nd order integrator and car-like dynamics. Results support that our method outperforms the baselines w.r.t. path smoothness, collision avoidance, and success rates.

agent, artificial intelligence, constraint, (12 more...)

arXiv.org Artificial Intelligence

2409.10117

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Modeling Urban Transport Choices: Incorporating Sociocultural Aspects

Salazar-Serna, Kathleen, Cadavid, Lorena, Franco, Carlos J.

arXiv.org Artificial IntelligenceJul-30-2024

By understanding how users decide on their commuting modes, it is possible to identify factors that can be influenced to change travel behavior and promote the adoption of more sustainable transportation modes. Agent-based modeling (ABM) is particularly valuable for this purpose, as it can represent complex systems like transportation and identify emerging collective behaviors resulting from the autonomous decisions of transport users interacting among them and with the environment (Kagho, Balac, and Axhausen 2020). These capabilities make ABM suitable for analyzing the impacts of transport policies (Wise, Crooks, and Batty 2017). However, the application of ABM in analyzing transport mode choices has been limited and studies have been conducted predominantly in developed countries (Cadavid and Salazar-Serna 2021; Salazar-Serna, Cadavid, Franco, and Carley 2023). The effectiveness of these findings may not extend seamlessly to developing regions due to different contextual policy needs and the distinct ways socioeconomic and cultural factors influence human behavior (Carley 1991; Salazar-Serna et al. 2023). Therefore, policies that have been successful in one setting might not achieve similar outcomes in another. Previous studies in transportation have identified various determinants affecting mode choice. These factors can be grouped into several categories: sociodemographic characteristics such as age, sex, occupation, and income level (Ashalatha et al. 2013); travel habits including distance traveled, travel time, origin-destination pairs, and trip purpose (Madhuwanthi et al. 2016); and attributes of the built environment like design, density, and capacity (Ewing and Cervero 2010). Additionally, attitudes and perceptions regarding transport modes, which cover aspects such as comfort, cost, security, safety, quality, and reliability, play a crucial role (Fu 2021).

mode choice, motorcycle, salazar-serna, (16 more...)

arXiv.org Artificial Intelligence

2407.21307

Country:

South America > Colombia > Valle del Cauca Department > Cali (0.04)
North America > Central America (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(9 more...)

Genre:

Questionnaire & Opinion Survey (0.93)
Research Report > Experimental Study (0.68)
Research Report > New Finding (0.68)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (0.73)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

Accelerating Drug Safety Assessment using Bidirectional-LSTM for SMILES Data

Rao, K. Venkateswara, Rao, Kunjam Nageswara, Ratnam, G. Sita

arXiv.org Artificial IntelligenceJul-8-2024

Computational methods are useful in accelerating the pace of drug discovery. Drug discovery carries several steps such as target identification and validation, lead discovery, and lead optimisation etc., In the phase of lead optimisation, the absorption, distribution, metabolism, excretion, and toxicity properties of lead compounds are assessed. To address the issue of predicting toxicity and solubility in the lead compounds, represented in Simplified Molecular Input Line Entry System (SMILES) notation. Among the different approaches that work on SMILES data, the proposed model was built using a sequence-based approach. The proposed Bi-Directional Long Short Term Memory (BiLSTM) is a variant of Recurrent Neural Network (RNN) that processes input molecular sequences for the comprehensive examination of the structural features of molecules from both forward and backward directions. The proposed work aims to understand the sequential patterns encoded in the SMILES strings, which are then utilised for predicting the toxicity of the molecules. The proposed model on the ClinTox dataset surpasses previous approaches such as Trimnet and Pre-training Graph neural networks(GNN) by achieving a ROC accuracy of 0.96. BiLSTM outperforms the previous model on FreeSolv dataset with a low RMSE value of 1.22 in solubility prediction.

information, prediction, toxicity, (15 more...)

arXiv.org Artificial Intelligence

2407.18919

Country:

North America > United States (0.30)
Asia > India > Andhra Pradesh > Visakhapatnam (0.04)
Europe > United Kingdom > England > West Yorkshire (0.04)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Materials > Metals & Mining > Lead (0.74)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback